Method and device for encoding video signal

ABSTRACT

The invention relates to the processing of video signals prior to encoding or other compression operations, and, more particularly, to a method for encoding video signals corresponding to a sequence of frames each of which consists of two fields F 1  and F 2 . The proposed method comprises the steps of receiving successive frames of an input video signal and delaying them with at least a “two fields” duration delay, and detecting any dominance change and adjusting said delay. When a change from an F 1  dominance to an F 2  dominance is detected, the first field of the first F 2  dominant frame is suppressed, and said delay is decreased by a quantity equal to “one field” duration; when a change from an F 2  dominance to an F 1  dominance is detected, the last field of the last F 2  dominant frame is repeated, and the delay is further increased by a quantity equal to “one field” duration. 
     The invention also relates to a method for encoding a sequence of frames including either video-type images or film-type images, and to an encoding system that carries out said method by incorporating the first solution hereinabove presented. If a sequence of film-type is detected, the inverse 3:2 pull-down technique is applied on the input frames, while in the opposite case, said technique is de-activated and replaced by said first solution: preprocessing according to the type of dominance change.

FIELD OF THE INVENTION

The present invention relates to a method for encoding video signalscorresponding to a sequence of frames each of which originally consistsof two fields F1 and F2, and to a corresponding encoding device.

BACKGROUND OF THE INVENTION

In a video sequence, composed of successive interlaced pictures (orframes), each frame is constituted by a pair of fields F1 and F2, asillustrated in FIG. 1 showing successive pairs of fields (each framecomprises a top field F(2n−1) (with n>0), or odd field, and a bottomfield F(2n), or even field, the odd frames being of type F1 and the evenframes of type F2) and the associated synchronization signal. When suchvideo fields come out, for instance at a rate of 50 fields/second (25frames/second) or 60 fields/second (30 frames/second), either of a videocamera or of any other type of video signal generator, the videomaterial has no field dominance (a frame is said to be “F1 dominant” ifit is constituted by a first field F1 followed by a second field F2, andto be “F2 dominant” if it is constituted by a field F2 followed by afield F1).

The field dominance becomes relevant when transferring data in such away that frame boundaries must be known and preserved. When the videomaterial is edited at frame boundaries, with a video recorder forexample, a decision is provided for specifying if the video material isF1 dominant or F2 dominant: FIGS. 3 and 4 respectively show, for apreexisting video material as indicated in FIG. 2, the structure of a F1dominant video material and of a F2 dominant video material. Once somematerial has acquired a particular chrominance, it must be manipulatedwith that dominance. Otherwise, a shift can occur in the representationof a frame, as shown in FIG. 5: the two first frames are F1 dominant,but the third one is F2 dominant and composed of two fields whichoriginally did not belong to the same frame. In such a case, encoding isless efficient: a scene cut between the two fields of an encoded framecosts a lot in terms of bitrate allocation efficiency. Moreover, F2dominance may lead to annoying vertical moving of pictures when a DVDplayer outputs frames in slow motion or still image mode.

SUMMARY OF THE INVENTION

It is therefore an object of the invention to propose an encoding methodin which the above-indicated drawbacks are avoided and the picturequality of any encoded video programme is increased.

To this end, the invention relates to a method such as described in theintroductory paragraph of the description and in which the encoding stepis preceded by a preprocessing step which comprises the sub-steps of:

-   (A) receiving the successive frames and delaying them with at least    a “two fields” duration delay;-   (B) adjusting said delay according to the following dominance change    criterion:    -   (a) when a change from an F1 dominance to an F2 dominance is        detected, the first field of the first F2 dominant frame is        suppressed, said delay being therefore decreased by a quantity        equal to “one field” duration;    -   (b) when a change from an F2 dominance to an F1 dominance is        detected, the last field of the last F2 dominant frame is        repeated, the delay being therefore further increased by a        quantity equal to “one field” duration.

The method thus proposed allows to detect the changes in field dominanceand to correct the input sequencing so that the frames can now beencoded correctly.

In an improved embodiment of the invention, in which the sequence offrames is constituted either by film-type images, to which the 3:2pull-down technique has been applied, or by video-type images consistingof two fields, said method comprises the steps of:

-   (A) detecting that the current sequence is constituted by film-type    images;-   (B) encoding said current sequence, either after said preprocessing    step when it is not detected as being of film-type or after    implementation, on said current sequence, of the inverse 3:2    pull-down technique if it is detected as being of film-type; and    said detecting step comprises the sub-steps of:    -   (a) defining for two successive fields F(n) and F(n+2) of the        same parity a number of pixels N2 such as N2=NTOT−N′2, where        NTOT is the number of pixels in a field, N′2 is the number of        pixels for which ABS (val F(n)−val F(n+2))<TH2, ABS designates        the function “absolute value”, val designates the luminance of a        pixel, and TH2 is a first predefined threshold;    -   (b) comparing the result of the subtraction of two consecutive        numbers N2, divided by NTOT, to a second predefined threshold        THR;    -   (c) detecting that the current sequence is constituted by        film-type images only when said result is lower than said second        threshold, said fields being then considered as equal.

It is also an object of the invention to propose a correspondingencoding device.

To this end, the invention relates to a device for encoding videosignals corresponding to a sequence of frames each of which originallyconsists of two fields F1 and F2, said sequence being constituted eitherby film-type images, to which the 3:2 pull-down technique has beenapplied, or by video-type images consisting of two fields, said devicecomprising:

-   (A) means for detecting in the input sequence of frames a sequence    of film-type images;-   (B) means for receiving the successive frames of the input sequence,    delaying each of them with a delay of at least two fields, and    adjusting said delay according to the following dominance charge    criterion:    -   (a) when a change from an F1 dominance to an F2 dominance is        detected, the first field of the first F2 dominant frame is        suppressed, said delay being therefore decreased by a quantity        equal to “one field” duration;    -   (b) when a change from an F2 dominance to an F1 dominance is        detected, the last field of the last F2 dominant frame is        repeated, the delay being therefore increased by a quantity        equal to “one field” duration.-   (C) means for encoding the input sequence of frames, either    connected in series with means (B) when said sequence is not    detected as being of film-type or after implementation of the    inverse 3:2 pull-down technique if it is detected as being of    film-type.

BRIEF DESCRIPTION OF THE DRAWINGS

The particularities of the invention will now be explained in a moredetailed manner, with reference to the accompanying drawings in which:

FIG. 1 shows, at a rate given by the associated synchronization signalon the time axis, a video sequence constituted by successive pairs offields;

FIG. 2 shows the successive frames F1, F2 of a preexisting videomaterial,

FIGS. 3 and 4 illustrate the structure of F1 dominant and F2 dominantvideo material, and

FIG. 5 illustrates the case of a video sequence in which a shift in therepresentation of the frames has occurred;

FIG. 6 shows an embodiment of a preprocessing device according to theinvention;

FIG. 7 illustrates the mechanism according to which the sequence ismodified by suppression or repetition of a field, in relation with thetype of dominance detection carried out in the preprocessing device;

FIG. 8 illustrates the 3:2 pull-down technique which allows to constructa sequence of five interlaced frames, or pairs of fields F(n) to F(n+9),with n=1 in the present case, from four original sequential frames;

FIG. 9 shows how fields are sequenced for the film mode format andillustrates the set of tests (identical ? or not ?) to be carried outfor the detection of a 3:2 pull-down structure;

FIG. 10 shows an encoding system in which the method according to theinvention is implemented;

FIG. 11 is an implementation of a preprocessing device comprised in theencoding device of FIG. 10.

DETAILED DESCRIPTION OF THE INVENTION

An example of implementation of a preprocessing device according to theinvention (before coding in a coding device 1003) is illustrated in FIG.6, in the case the input video stream is a sequence composed ofinformation corresponding to images of the video type, i.e. composed (asalready shown in FIG. 1) of successive pairs of frames F(1), F(2), . . ., F(i), . . . and so on.

Such a sequence is assumed to be F1 dominant, which corresponds in FIG.6 to the upper position of a switch 61; each successive input field IFis then delayed in a memory 63, with a delay of two fields, or at leasttwo fields (this delay is illustrated in line (b) of FIG. 7 for frames 1to 3, by a comparison with the corresponding frames of the line (a)).When a change from “F1 dominant” to “F2 dominant” is detected by meansof a circuit 64 for the detection of a field dominance change (instantt12 in line (a) of FIG. 7), the switch 61, controlled by this circuit64, comes back to its lower position (see FIG. 6), for which eachsuccessive input field IF is now delayed in a memory 65, with a delay ofonly one field (or one field less, in the case of a greater delay forthe memory 63). The first frame with F2 dominance is suppressed, and allthe subsequent input fields are now delivered with only a “one field”duration delay (see the frames 4 and 5 in line (b) of FIG. 7), so thatno gap occurs in the output sequence.

When a further change from “F2 dominant” to “F1 dominant” is detected bythe circuit 64 (instant t21 in line (a) of FIG. 7), the last field F1 ofthe last F2 dominant frame is repeated in order to retrieve a correctsequencing: all the subsequent input fields are now, as initially,delivered again with a “two fields” duration delay (see the frames 6 and7 in line (b) of FIG. 7), or one field more in the case of a greaterdelay for the memory 63.

The detection of dominance in the field dominance change detectioncircuit 64 is for instance made through the use of a scene cut detectionmethod, carried out between consecutive fields. Such a method isdescribed for example in documents such as “Hierarchical scene changedetection in an MPEG-2 compressed video sequence”, by T.Shin and al.,Proceedings of the 1998 IEEE ISCAS, May 31, 1998, Monterey, Ca., USA,pp.IV-253 to IV-256, or “A unified approach to shot change detection andcamera motion characterization”, by P. Bouthemy and al., IEEETransactions on Circuits and Systems for Video Technology, vol. 9,n^(o)7, October 1999, pp. 1030–1044.

An improved embodiment of the invention may also be proposed in thefollowing case. In the NTSC standard, the picture frequency is 30interlaced frames per second. However, for movies, the frames areproduced at a frame rate of 24 Hz. When it is required to visualize asequence of film-type images on television, it is therefore necessary toconvert the movie's frame rate to the NTSC standard. The techniquecurrently used, which is known as “3:2 pull-down” and is described forinstance in the international patent application WO 97/39577, consistsof creating five interlaced frames (which can be therefore visualized ontelevision) based on four original sequential film frames. This isobtained by dividing each of these four sequential frames by two, so asto form four odd and four even fields and by duplicating two of theseeight fields.

As illustrated in FIG. 8, which shows a film sequence at 24 Hz on thefirst line and illustrates on the second line how to organize the fieldsequencing of a corresponding video sequence at 30 Hz, it means that anadditional field is inserted for each pair of film frames, for instanceby splitting one film frame out of two into three fields, the other onebeing split as usually into two fields. In the case of the frame splitinto three fields (for instance, G1G2 split into F1, F2, F3, or G5G6split into F6, F7, F8), the third one is obtained by copying the odd(F1) or the even field (F6) alternately, in order to keep the sequencing“odd/even”. The result is the following:

-   -   F1=F3=G1    -   F2=G2    -   F4=G4    -   F5=G3    -   F6=F8=G6    -   F7=G5    -   F9=G7    -   F10=G8, and so on.        These two additional fields obtained by duplication constitute a        redundant information. When encoding such sequences according to        the MPEG-2 standard, it is interesting to detect said        information: the suppression of these repeated fields will then        free some space to better encode the others, the concerned        MPEG-2 encoder thus receiving video-type image sequences at 30        Hz and original film-type image sequences at 24 Hz.

An usual criterion to detect automatically sequences coming from movies(film-type image sequences) is therefore the following: a structure offive frames—i.e. of ten fields—is analyzed by means of a subtraction ofconsecutive fields of the same parity. The condition to detect the 3:2pull-down structure is the following:

-   -   F1=F3    -   F2≠F4    -   F3≠F5    -   F4≠F6    -   F5≠F7    -   F6=F8    -   F7≠F9    -   F8≠F10,        which is illustrated in the sequence of FIG. 9, where f1, f2,        designate the successive frames, 1 o–1 e, 1 o–2 e, 2 o–3 e, . .        . the corresponding pairs of fields, y the reply “yes” to the        test of comparison (i.e. fields equal), and n the reply “no”        (i.e. fields different). If all these conditions are satisfied,        then the inverse 3:2 pull-down conversion is performed on a        group of five frames; on the contrary, if one of these        conditions is not valid, the encoder goes back to the video mode        (no elimination of two fields).

However, due to the possible presence of noise on the original 3:2pull-down sequence, the equality criterion between two fields (F1, F3and F6, F8) may be not strictly verified. Two fields of the same parityF(N) and F(N+2) are considered. If NTOT designates the total number ofpixels in a field (172800 for a full resolution), val (F(N)) designatesthe luminance value for a given pixel, N1 is the number of pictureelements (pixels) such as ABS[val(F(N))−val (F(N+2))]>THRES1, Nm is thenumber of pixels such as ABS [val(F(N))−val (F(N+2))]<THRES2, N2 is thenumber of pixels such as N2=NTOT−Nm, and THRES1, THRES2 arepredetermined thresholds, then the following test, Ratio 1 and Ratio 2being values previously chosen, is carried out:IF ((N 1<Ratio 1) and (N 2<Ratio 2)) THEN: F(N)=F(N+2 ) ELSE:F(N)≠F(N+2)

The first criterion (N1<Ratio 1) may be called “the dissimilaritycriterion” and involves the number of pixels where the field-to-fieldpixel difference is large, while the second one (N2<Ratio2) may becalled “the likeness criterion” and involves the number of pixels wherethe field-to-field pixel difference is small.

Troubles within the film mode detection step may consequently occurmostly in the case of the two following contrasted situations. Forstatic or quasi-static sequences, the dissimilarity criterion is no moreverified, since the fields are nearly all equal, and may be thereforesuppressed, the residual conditions needed to be fulfilled being thenonly F1=F3 and F6=F8. But, for a very noisy sequence, with which twoidentical fields may however seem unlike, the threshold setting thelikeness criterion cannot be too increased, otherwise fields that aredifferent could be considered as identical. The criterion for detectingautomatically sequences coming from movies may then be modified on thebasis of the following remark. By looking at the N2 statistics (N2 hasbeen defined hereinabove), the applicant has noticed that N2 for fieldsF1 and F3 (referenced N2[1,3]) and N2 for fields F6 and F8 (referencedN2[6,8]) are small compared to the others (more generally, N2[i,j]stands for statistics of N2 calculated for Fj−Fi). Then, by computingthe difference between two consecutive N2 statistics, for instance:N2[6,8]−N2[5,7], and comparing—in the form of a percentage—such adifference to a predetermined threshold (according to an expression ofthe following form: N2[5,7]−N2[6,8] ×100/NTOT for example), a largevalue of percentage is obtained every five computations. Therefore, ifthe computed percentage is less than X %, with for instance X=30%, thenboth fields (of the last considered pair of fields) are considered asequal, and the inverse 3:2 pull-down processing is carried out for thenext five frames.

An encoding system in which this preprocessing operation is included isdescribed with reference to FIG. 10. This encoding system comprisesmeans 101 for encoding input signals corresponding to a sequence eithercoming from movies or of video type, means 102 for detecting in saidinput signals a sequence of film type (said detecting means being adetecting stage activated as explained later), and means 103 forswitching, only when such a detection has occurred, from a first to asecond mode of operation of the encoding means 101. The encoding means101 comprise a first preprocessing device 1011, a second preprocessingdevice 1012, and a coding device 1013, for instance an MPEG-2 coder.

The detecting stage, illustrated in FIG. 11, itself comprise a set ofsubtractors 141.1, 141.2, 141.3, . . . , provided for receiving each onetwo successive fields of the same parity and determining per pixel thedifference between these fields, followed by a set of circuits 142.1,142.2, 142.3, . . . , provided for taking the absolute value of saiddifference; this value is stored in a memory, 143.1, 143.2, 143.3, . . ., respectively. The successive differences between the successivesvalues of these stored absolute values are then computed in subtractors144.1, 144.2, 144.3, . . . , and these differences, for instancemultiplied by 1100/NTOT as indicated above, are compared to thepredefined threshold (tests C1). If the fields are equal, i.e. theycorrespond to film-type images (in the present case, for F1=F3 and forF6=F8), an inverse 3:2 pull-down processing can be carried out for thenext five frames, in the first preprocessing device 1011; this situationcorresponds to the lower position of the switching means 103. When it isnot the case (video-type images), the switching means 103 are in theopposite position (upper position). The device 1011 is thende-activated, and in the same time the second preprocessing device 1012becomes active (this device 1012 has exactly the same structure as thepreprocessing device of FIG. 6).

An encoding system corresponding to this last description may be usedfor transmitting animated images with television systems operating at afrequency of 60 hertz (for instance with the NTSC standard used incountries such as Japan or the United States of America).

1. A method for encoding video signals corresponding to a sequence offrames each of which originally consists of two fields F1 and F2, inwhich the encoding step is preceded by a preprocessing step which itselfcomprises the sub-steps of: (A) receiving the successive frames anddelaying each of them with a delay of at least two fields; (B) adjustingsaid delay according to the following dominance change criterion: (a)when a change from an F1 dominance to an F2 dominance is detected, thefirst field of the first F2 dominant frame is suppressed, said delaybeing therefore decreased by a quantity equal to “one field” duration;(b) when a change from an F2 dominance to an F1 dominance is detected,the last field of the last F2 dominant frame is repeated, the delaybeing therefore increased by a quantity equal to “one field” duration.2. The method according to claim 1, said sequence of frames beingconstituted either by film-type images, to which a 3:2 pull-downtechnique has been applied, or by video-type images consisting of twofields, said method comprising the steps of: (A) detecting that thecurrent sequence is constituted by film-type images (B) encoding saidcurrent sequence, either after said preprocessing step when it is notdetected as being of film-type or after implementation, on said currentsequence, of the inverse 3:2 pull-down technique if it is detected asbeing of film-type; and said detecting step comprising the sub-steps of:(a) defining for two successive fields F(n) and F(n+2) of the sameparity a number of pixels N2 such as N2=NTOT−N′2, where NTOT is thenumber of pixels in a field, N′2 is the number of pixels for which ABS(val F(n)−val F(n+2))<TH2, ABS designates the function “absolute value”,val designates the luminance of a pixel, and TH2 is a first predefinedthreshold; (b) comparing the result of the subtraction of twoconsecutive numbers N2, divided by NTOT, to a second predefinedthreshold THR; (C) detecting that the current sequence is constituted byfilm-type images only when said result is lower than said secondthreshold, said fields being then considered as equal.
 3. A device forencoding video signals corresponding to a sequence of frames each ofwhich originally consists of two fields F1 and F2, said sequence beingconstituted either by film-type images, to which a 3:2 pull-downtechnique has been applied, or by video-type images consisting of twofields, said device comprising: (A) means for detecting in the inputsequence of frames a sequence of film-type images; (B) means forreceiving the successive frames of the input sequence, delaying each ofthem with a delay of at least two fields, and adjusting said delayaccording to the following dominance charge criterion: (a) when a changefrom an F1 dominance to an F2 dominance is detected, the first field ofthe first F2 dominant frame is suppressed, said delay being thereforedecreased by a quantity equal to “one field” duration; (b) when a changefrom an F2 dominance to an F1 dominance is detected, the last field ofthe last F2 dominant frame is repeated, the delay being thereforeincreased by a quantity equal to “one field” duration; (c) means forencoding the input sequence of frames, either connected in series withmeans (B) when said sequence is not detected as being of film-type orafter implementation of the inverse 3:2 pull-down technique if it isdetected as being of film-type.
 4. The device according to claim 3, inwhich said detecting means comprise a set of subtractors, provided forreceiving each one two successive fields of the same parity anddetermining per pixel the difference between these fields and followedby a set of circuits provided for taking the absolute value of saiddifference and storing it, computing in subtractors the successivedifferences between the successive values of these stored absolutevalues, comparing these differences to a predefined threshold, anddetecting a sequence of film-type only when said difference is lowerthan a predefined threshold, said fields being then considered as equal.5. A system for pre-processing video signals corresponding to a sequenceof frames each of which originally consists of two fields F1 and F2,prior to encoding, said system comprising: a processor in communicationwith a memory, said processor executing code for: (A) receiving saidsuccessive frames and delaying each of them with a delay of at least twofields; (B) adjusting said delay according to the following dominancechange criterion: (a) when a change from an F1 dominance to an F2dominance is detected, the first field of the first F2 dominant frame issuppressed, said delay being therefore decreased by a quantity equal to“one field” duration; (b) when a change from an F2 dominance to an F1dominance is detected, the last field of the last F2 dominant frame isrepeated, the delay being therefore increased by a quantity equal to“one field” duration.
 6. The system according to claim 5, wherein saidsequence of frames being constituted either by film-type images, towhich a 3:2 pull-down technique has been applied, or by video-typeimages consisting of two fields, wherein the processor further executingcode for: (A) detecting that the current sequence is constituted byfilm-type images; (B) encoding said current sequence, either after saidpreprocessing step when it is not detected as being of film-type orafter implementation, on said current sequence, of the inverse 3:2pull-down technique if it is detected as being of film-type; and (C)detecting that the current sequence is constituted by film-type imagesonly when said result is lower than said second threshold, said fieldsbeing then considered as equal.
 7. The system according to claim 6,wherein the code for detecting in step (A) further comprises code for:(a) defining for two successive fields F(n) and F(n+2) of the sameparity a number of pixels N2 such as N2=NTOT−N′2, where NTOT is thenumber of pixels in a field, N′2 is the number of pixels for which ABS(val F(n)−val F(n+2))<TH2, ABS designates the function “absolute value”,val designates the luminance of a pixel, and TH2 is a first predefinedthreshold; and (b) comparing the result of the subtraction of twoconsecutive numbers N2, divided by NTOT, to a second predefinedthreshold THR.